First thing we want to do is to load data from A1 and A2, and install all required packages for this assignment.
Assignment 1 was focused on cleaning data by removing outliers, mapping gene id to HUGO gene symbols, and normalizing data with TMM method. Gene set information are shown as below
description for dataset .
# First, we want to get the description for dataset GSE145412.
gse <- getGEO("GSE145412",GSEMatrix=FALSE)
kable(data.frame(head(Meta(gse))), format = "html")
| contact_address | contact_city | contact_country | contact_email | contact_institute | contact_name |
|---|---|---|---|---|---|
| M. Sklodowskiej-Curie 24A | Bialystok | Poland | magdalena.paczkowska@umb.edu.pl | Medical University of Bialystok | Magdalena,,Paczkowska-Abdulsalam |
information associated with the dataset.
# Then, we want to get information about the dataset
current_gpl <- names(GPLList(gse))[1]
current_gpl_info <- Meta(getGEO(current_gpl))
## File stored at:
## /tmp/RtmpJ1ickg/GPL18573.soft
Platform Title: Illumina NextSeq 500 (Homo sapiens)
Original submission date: Apr 15 2014
Last update date: Mar 26 2019
Organism: Homo sapiens
No. of GEO datasets that use this technology: 5133
No. of GEO samples that use this technology: 204776
normalized data are in the format below
A1_normalized = readRDS("normalized_data.rds")
kable(A1_normalized[1:5,1:5], type="html")
| sample_1 | sample_4 | sample_5 | sample_6 | sample_7 | |
|---|---|---|---|---|---|
| 7SK | 1.8665112 | 0.4927562 | 2.3163723 | 0.6257193 | 0.4934407 |
| A1BG-AS1 | 0.2201429 | 0.2269133 | 0.2236347 | 0.1269422 | 0.1814833 |
| A2M-AS1 | 0.0419011 | 0.0575317 | 0.0302224 | 0.0658098 | 0.0579427 |
| A3GALT2 | 0.0448311 | 0.0154599 | 0.0187952 | 0.0399286 | 0.0213537 |
| AAAS | 0.4379074 | 0.4786196 | 0.4105074 | 0.5047566 | 0.4477859 |
The normalized data from Assignment #1 was ranked according to differential expression in Assignment 2. Then, we split up-regulated genes and down-regulated genes and store then into two files. After that, threshold over-representation analysis (g:Profiler) was performed on this data.and the top results are being analyzed.
The results are shown as below
Figure 1: differential expression genes overview
After that, we saved the rank list into a text file under data folder. Now we need to convert this text file to a rnk file in order to run GSEA.
temp_data <- read.delim("data/mets_ranked_genelist.txt", header = FALSE)
write.table(x=temp_data, file=file.path("data","mets_ranked_genelist.rnk"),sep = "\t", row.names = FALSE,col.names = FALSE,quote = FALSE)
kable(temp_data[1:5,1:2], type="html")
| V1 | V2 |
|---|---|
| MYL9 | -3.672864 |
| PF4 | -3.473816 |
| NRGN | -3.172615 |
| S100A11 | -2.940113 |
| OAZ1 | -2.769050 |
Now we have a rnk file, next step is to run GSEA program
#path to GSEA jar
gsea_jar <- params$gsea_jar
java_version <- params$java_version
#Gsea takes a long time to run. If you have already run GSEA manually or previously there is no need to re-run GSEA. Make sure the
# gsea results are in the current directory and the notebook will be able to find them and use them.
run_gsea = params$run_gsea
#navigate to the directory where you put the downloaded protocol files.
working_dir <- params$working_dir
gsea_directory = params$gsea_directory
analysis_name <- params$analysis_name
rnk_file <- params$rnk_file
expression_file <- params$expression_file
classes_file <- params$classes_file
is_docker <- params$is_docker
gmt_url = "http://download.baderlab.org/EM_Genesets/current_release/Human/symbol/"
#list all the files on the server
filenames = getURL(gmt_url)
tc = textConnection(filenames)
contents = readLines(tc)
close(tc)
#get the gmt that has all the pathways and does not include terms inferred from electronic annotations(IEA)
#start with gmt file that has pathways only
rx = gregexpr("(?<=<a href=\")(.*.GOBP_AllPathways_no_GO_iea.*.)(.gmt)(?=\">)",
contents, perl = TRUE)
gmt_file = unlist(regmatches(contents, rx))
dest_gmt_file <- file.path(working_dir,paste("Supplementary_Table3_",gmt_file,sep="") )
download.file(
paste(gmt_url,gmt_file,sep=""),
destfile=dest_gmt_file
)
run GSEA with parameters. set maximum geneset size of 200, minimum geneset size of 15 and gene set permutation
if(run_gsea && java_version == "11"){
command <- paste("",gsea_jar, "GSEAPreRanked -gmx", dest_gmt_file, "-rnk" ,file.path(working_dir,rnk_file), "-collapse false -nperm 1000 -scoring_scheme weighted -rpt_label ",analysis_name," -plot_top_x 20 -rnd_seed 12345 -set_max 200 -set_min 15 -zip_report false -out" ,working_dir, " > gsea_output.txt",sep=" ")
system(command)
} else if (run_gsea) {
command <- paste("java -Xmx1G -cp",gsea_jar, "xtools.gsea.GseaPreranked -gmx", dest_gmt_file, "-rnk" ,file.path(working_dir,rnk_file), "-collapse false -nperm 1000 -permute gene_set -scoring_scheme weighted -rpt_label ",analysis_name," -num 100 -plot_top_x 20 -rnd_seed 12345 -set_max 200 -set_min 15 -zip_report false -out" ,working_dir, "-gui false > gsea_output.txt",sep=" ")
system(command)
}
get output directory
if(gsea_directory == ""){
gsea_directories <- list.files(path = working_dir, pattern = "\\.GseaPreranked")
#get the details on the files
details = file.info(file.path(getwd(),working_dir,gsea_directories))
#order according to newest to oldest
details = details[with(details, order(as.POSIXct(mtime),decreasing = TRUE)), ]
#use the newest file:
gsea_output_dir <- row.names(details)[1]
} else {
gsea_output_dir <- gsea_directory
}
All the result are stored in MetS.GseaPreranked folder under data directory. To obtain all the result for positive or negative. Below shows the top hits for postive and negative result.
gsea_dir <- file.path(getwd(),"data", gsea_directories[1])
#get the gsea result files
gsea_results_files <- list.files(path = gsea_dir,
pattern = "gsea_report_*.*.tsv")
#there should be 2 gsea results files
enr_file1 <- read.table(file.path(gsea_dir,
gsea_results_files[grepl("neg", gsea_results_files)]),
header = TRUE, sep = "\t", quote="\"",
stringsAsFactors = FALSE,row.names=1)
enr_file2 <- read.table(file.path(gsea_dir,
gsea_results_files[grepl("pos", gsea_results_files)]),
header = TRUE, sep = "\t", quote="\"",
stringsAsFactors = FALSE,row.names=1)
knitr::kable(enr_file1[1:20,c(1,4:8)], "html",
caption = "Top Down Regulated Enrichment Analysis Result")
| GS.br..follow.link.to.MSigDB | ES | NES | NOM.p.val | FDR.q.val | FWER.p.val | |
|---|---|---|---|---|---|---|
| NEUTROPHIL MIGRATION%GOBP%GO:1990266 | NEUTROPHIL MIGRATION%GOBP%GO:1990266 | -0.8829091 | -1.907374 | 0 | 0.0000000 | 0.000 |
| NEUTROPHIL CHEMOTAXIS%GOBP%GO:0030593 | NEUTROPHIL CHEMOTAXIS%GOBP%GO:0030593 | -0.8897292 | -1.895108 | 0 | 0.0000000 | 0.000 |
| ANTIMICROBIAL HUMORAL IMMUNE RESPONSE MEDIATED BY ANTIMICROBIAL PEPTIDE%GOBP%GO:0061844 | ANTIMICROBIAL HUMORAL IMMUNE RESPONSE MEDIATED BY ANTIMICROBIAL PEPTIDE%GOBP%GO:0061844 | -0.8848908 | -1.869164 | 0 | 0.0002544 | 0.001 |
| GRANULOCYTE CHEMOTAXIS%GOBP%GO:0071621 | GRANULOCYTE CHEMOTAXIS%GOBP%GO:0071621 | -0.8671735 | -1.857884 | 0 | 0.0008142 | 0.004 |
| ANTIMICROBIAL PEPTIDES%REACTOME%R-HSA-6803157.2 | ANTIMICROBIAL PEPTIDES%REACTOME%R-HSA-6803157.2 | -0.8803539 | -1.853803 | 0 | 0.0010194 | 0.006 |
| GRANULOCYTE MIGRATION%GOBP%GO:0097530 | GRANULOCYTE MIGRATION%GOBP%GO:0097530 | -0.8572405 | -1.848430 | 0 | 0.0013104 | 0.009 |
| ANTIMICROBIAL HUMORAL RESPONSE%GOBP%GO:0019730 | ANTIMICROBIAL HUMORAL RESPONSE%GOBP%GO:0019730 | -0.8383114 | -1.844298 | 0 | 0.0011466 | 0.009 |
| MICROGLIA PATHOGEN PHAGOCYTOSIS PATHWAY%WIKIPATHWAYS_20210210%WP3937%HOMO SAPIENS | MICROGLIA PATHOGEN PHAGOCYTOSIS PATHWAY%WIKIPATHWAYS_20210210%WP3937%HOMO SAPIENS | -0.9002324 | -1.833128 | 0 | 0.0018092 | 0.016 |
| SELENIUM MICRONUTRIENT NETWORK%WIKIPATHWAYS_20210210%WP15%HOMO SAPIENS | SELENIUM MICRONUTRIENT NETWORK%WIKIPATHWAYS_20210210%WP15%HOMO SAPIENS | -0.8663569 | -1.827882 | 0 | 0.0016283 | 0.016 |
| CELL KILLING%GOBP%GO:0001906 | CELL KILLING%GOBP%GO:0001906 | -0.8421238 | -1.824320 | 0 | 0.0015736 | 0.017 |
| PLATELET DEGRANULATION%REACTOME DATABASE ID RELEASE 75%114608 | PLATELET DEGRANULATION%REACTOME DATABASE ID RELEASE 75%114608 | -0.8028120 | -1.820281 | 0 | 0.0015278 | 0.018 |
| RESPONSE TO ELEVATED PLATELET CYTOSOLIC CA2+%REACTOME%R-HSA-76005.2 | RESPONSE TO ELEVATED PLATELET CYTOSOLIC CA2+%REACTOME%R-HSA-76005.2 | -0.7953726 | -1.816867 | 0 | 0.0016461 | 0.021 |
| MYELOID LEUKOCYTE MIGRATION%GOBP%GO:0097529 | MYELOID LEUKOCYTE MIGRATION%GOBP%GO:0097529 | -0.8314753 | -1.815126 | 0 | 0.0015285 | 0.021 |
| TYROBP CAUSAL NETWORK%WIKIPATHWAYS_20210210%WP3945%HOMO SAPIENS | TYROBP CAUSAL NETWORK%WIKIPATHWAYS_20210210%WP3945%HOMO SAPIENS | -0.8229974 | -1.813868 | 0 | 0.0014946 | 0.022 |
| LEUKOCYTE CHEMOTAXIS%GOBP%GO:0030595 | LEUKOCYTE CHEMOTAXIS%GOBP%GO:0030595 | -0.8142357 | -1.808093 | 0 | 0.0016569 | 0.026 |
| COFACTOR CATABOLIC PROCESS%GOBP%GO:0051187 | COFACTOR CATABOLIC PROCESS%GOBP%GO:0051187 | -0.8381607 | -1.803955 | 0 | 0.0019194 | 0.032 |
| ROS AND RNS PRODUCTION IN PHAGOCYTES%REACTOME%R-HSA-1222556.9 | ROS AND RNS PRODUCTION IN PHAGOCYTES%REACTOME%R-HSA-1222556.9 | -0.8726843 | -1.802897 | 0 | 0.0018694 | 0.033 |
| CELL REDOX HOMEOSTASIS%GOBP%GO:0045454 | CELL REDOX HOMEOSTASIS%GOBP%GO:0045454 | -0.8497583 | -1.801876 | 0 | 0.0018250 | 0.034 |
| PLATELET DEGRANULATION%GOBP%GO:0002576 | PLATELET DEGRANULATION%GOBP%GO:0002576 | -0.7945729 | -1.799563 | 0 | 0.0020400 | 0.040 |
| DISRUPTION OF CELLS OF OTHER ORGANISM%GOBP%GO:0044364 | DISRUPTION OF CELLS OF OTHER ORGANISM%GOBP%GO:0044364 | -0.9143146 | -1.797062 | 0 | 0.0022332 | 0.046 |
knitr::kable(enr_file2[1:20,c(1,4:8)], "html",
caption = "Top Up Regulated Enrichment Analysis Result")
| GS.br..follow.link.to.MSigDB | ES | NES | NOM.p.val | FDR.q.val | FWER.p.val | |
|---|---|---|---|---|---|---|
| NEGATIVE REGULATION OF TYPE I INTERFERON-MEDIATED SIGNALING PATHWAY%GOBP%GO:0060339 | NEGATIVE REGULATION OF TYPE I INTERFERON-MEDIATED SIGNALING PATHWAY%GOBP%GO:0060339 | 0.8149089 | 2.196550 | 0.0000000 | 0.0365328 | 0.042 |
| NEGATIVE REGULATION OF UBIQUITIN-PROTEIN TRANSFERASE ACTIVITY%GOBP%GO:0051444 | NEGATIVE REGULATION OF UBIQUITIN-PROTEIN TRANSFERASE ACTIVITY%GOBP%GO:0051444 | 0.7817173 | 2.108648 | 0.0000000 | 0.0759830 | 0.166 |
| REGULATION OF DEFENSE RESPONSE TO VIRUS%GOBP%GO:0050688 | REGULATION OF DEFENSE RESPONSE TO VIRUS%GOBP%GO:0050688 | 0.6180260 | 2.085647 | 0.0000000 | 0.0696607 | 0.223 |
| THE HUMAN IMMUNE RESPONSE TO TUBERCULOSIS%WIKIPATHWAYS_20210210%WP4197%HOMO SAPIENS | THE HUMAN IMMUNE RESPONSE TO TUBERCULOSIS%WIKIPATHWAYS_20210210%WP4197%HOMO SAPIENS | 0.7641495 | 2.076919 | 0.0000000 | 0.0598717 | 0.249 |
| POSITIVE REGULATION OF DEFENSE RESPONSE TO VIRUS BY HOST%GOBP%GO:0002230 | POSITIVE REGULATION OF DEFENSE RESPONSE TO VIRUS BY HOST%GOBP%GO:0002230 | 0.7229613 | 2.067987 | 0.0000000 | 0.0532673 | 0.269 |
| REGULATION OF DEFENSE RESPONSE TO VIRUS BY HOST%GOBP%GO:0050691 | REGULATION OF DEFENSE RESPONSE TO VIRUS BY HOST%GOBP%GO:0050691 | 0.6435527 | 2.008794 | 0.0076923 | 0.0842204 | 0.449 |
| REGULATION OF ALTERNATIVE MRNA SPLICING, VIA SPLICEOSOME%GOBP%GO:0000381 | REGULATION OF ALTERNATIVE MRNA SPLICING, VIA SPLICEOSOME%GOBP%GO:0000381 | 0.6250964 | 2.000386 | 0.0000000 | 0.0801236 | 0.485 |
| HALLMARK_INTERFERON_ALPHA_RESPONSE%MSIGDB_C2%HALLMARK_INTERFERON_ALPHA_RESPONSE | HALLMARK_INTERFERON_ALPHA_RESPONSE%MSIGDB_C2%HALLMARK_INTERFERON_ALPHA_RESPONSE | 0.5657241 | 1.965435 | 0.0000000 | 0.1069109 | 0.641 |
| REGULATION OF UBIQUITIN PROTEIN LIGASE ACTIVITY%GOBP%GO:1904666 | REGULATION OF UBIQUITIN PROTEIN LIGASE ACTIVITY%GOBP%GO:1904666 | 0.7040978 | 1.924069 | 0.0000000 | 0.1423176 | 0.792 |
| TYPE I INTERFERON SIGNALING PATHWAY%GOBP%GO:0060337 | TYPE I INTERFERON SIGNALING PATHWAY%GOBP%GO:0060337 | 0.5328547 | 1.883854 | 0.0000000 | 0.1884126 | 0.898 |
| HALLMARK_INTERFERON_GAMMA_RESPONSE%MSIGDB_C2%HALLMARK_INTERFERON_GAMMA_RESPONSE | HALLMARK_INTERFERON_GAMMA_RESPONSE%MSIGDB_C2%HALLMARK_INTERFERON_GAMMA_RESPONSE | 0.5088842 | 1.877887 | 0.0000000 | 0.1793557 | 0.906 |
| ANTIVIRAL MECHANISM BY IFN-STIMULATED GENES%REACTOME%R-HSA-1169410.4 | ANTIVIRAL MECHANISM BY IFN-STIMULATED GENES%REACTOME%R-HSA-1169410.4 | 0.4983351 | 1.848086 | 0.0000000 | 0.2123345 | 0.954 |
| RESPONSE TO TYPE I INTERFERON%GOBP%GO:0034340 | RESPONSE TO TYPE I INTERFERON%GOBP%GO:0034340 | 0.5279398 | 1.842471 | 0.0000000 | 0.2066486 | 0.958 |
| REGULATION OF TYPE I INTERFERON-MEDIATED SIGNALING PATHWAY%GOBP%GO:0060338 | REGULATION OF TYPE I INTERFERON-MEDIATED SIGNALING PATHWAY%GOBP%GO:0060338 | 0.6174607 | 1.818549 | 0.0000000 | 0.2348676 | 0.984 |
| MYOBLAST DIFFERENTIATION%GOBP%GO:0045445 | MYOBLAST DIFFERENTIATION%GOBP%GO:0045445 | 0.6752113 | 1.814272 | 0.0143541 | 0.2268365 | 0.986 |
| CELLULAR RESPONSE TO TYPE I INTERFERON%GOBP%GO:0071357 | CELLULAR RESPONSE TO TYPE I INTERFERON%GOBP%GO:0071357 | 0.5328547 | 1.813603 | 0.0000000 | 0.2134491 | 0.986 |
| HOST-PATHOGEN INTERACTION OF HUMAN CORONA VIRUSES - INTERFERON INDUCTION%WIKIPATHWAYS_20210210%WP4880%HOMO SAPIENS | HOST-PATHOGEN INTERACTION OF HUMAN CORONA VIRUSES - INTERFERON INDUCTION%WIKIPATHWAYS_20210210%WP4880%HOMO SAPIENS | 0.5833607 | 1.807796 | 0.0000000 | 0.2100040 | 0.988 |
| TYPE I INTERFERON INDUCTION AND SIGNALING DURING SARS-COV-2 INFECTION%WIKIPATHWAYS_20210210%WP4868%HOMO SAPIENS | TYPE I INTERFERON INDUCTION AND SIGNALING DURING SARS-COV-2 INFECTION%WIKIPATHWAYS_20210210%WP4868%HOMO SAPIENS | 0.5841016 | 1.782053 | 0.0206897 | 0.2429643 | 0.994 |
| CD4-POSITIVE, ALPHA-BETA T CELL DIFFERENTIATION%GOBP%GO:0043367 | CD4-POSITIVE, ALPHA-BETA T CELL DIFFERENTIATION%GOBP%GO:0043367 | 0.6419525 | 1.759785 | 0.0228571 | 0.2701851 | 0.999 |
| REGULATION OF NUCLEASE ACTIVITY%GOBP%GO:0032069 | REGULATION OF NUCLEASE ACTIVITY%GOBP%GO:0032069 | 0.6240582 | 1.759063 | 0.0119048 | 0.2584494 | 0.999 |
Reload the result from A2
I decide to use Cytoscape locally. Since the Cytoscape can not recognize “—”, so I opened pos and neg result file and mannually changed “—” to 1. The app we use was EnrichmentMap in Cytoscape
There are 803 nodes and 4982 edges in the result. The parameter I used was FDR set to 0.1, node cutoff(p-value) set to 1.0, edge cutoff set to 0.375(default). The blue node shows the down-regulated group, and the red node shows the up-regulated group. The initial network was shown below.
Figure 3: Initial result
For annotation, I used the AutoAnnotate app in Cytoscape. All parameter are set to default. Cluster algorithm is MCL Cluster, the edge weight column set to similarity_coefficient, Max words per label was 3, adjacent word bonus was set to 8. below shows all the parameters for annotation.
Figure 4: Annotation setting
The publication ready figure was shown as below
Figure 5: Annotation setting
From the graph, there are two large cluster which are antigen proteasome degradation (60 nodes) and glycolytic process biosynthetic(67 nodes). Most of the node are blue, which means the gene was down-regulated.
After create summarized network, the major themes present is antigen proteasome degradation, this related to immunity responds(Rivett,A.J, 2004). This can related to the result from GSEA and g:Profiler. The novel pathway would be glycolytic process biosynthetic since this pathway was not really mentioned in either GSEA and g:Profiler results.
A summarized network was shown below, cluster layout was set to CoSE
Figure 6: Summarized network
The enrichment result not really close to the original paper. It is more close to the g:profiler result from thresholded methods. The difference are in the up-regulated group, but since this group only has a small number of data(only one red dot on the above graph), so the result change can not tell us too much.
There is evidence from Andersen et al.(2016), the article stated that MetS and obesity could cause inflammation and trigger immunu responds, this evidence directly supports the finding. This support that The top result for down-regulated genes are related to immune response, this could associated with MetS and obesity.
gmt_file <- file.path("data", "Supplementary_Table3_Human_GOBP_AllPathways_no_GO_iea_March_01_2021_symbol.gmt")
capture.output(genesets <- GSA::GSA.read.gmt(gmt_file), file = "gsa_loud.out")
# give proper indices/names
names(genesets$genesets) <- genesets$geneset.names
expression <- readRDS("normalized_data.rds")
# Load genes from enriched pathway
all_enr_genesets <- c(rownames(enr_file1), rownames(enr_file2))
genes_enr_gs <- c()
for(i in 1:length(all_enr_genesets)){
current_geneset <- unlist(genesets$genesets[which(genesets$geneset.names %in% all_enr_genesets[i])])
genes_enr_gs <- union(genes_enr_gs, current_geneset)
}
ranks <- read.table(file.path(getwd(), "data", "ranked_list.tsv"),
header = TRUE, sep = "\t", quote = "\"",
stringsAsFactors = FALSE)
FDR_threshold <- 0.001
#get the genes from the set of enriched pathwasy (no matter what threshold)
all_sig_enr_genesets<- c(rownames(enr_file1)[which(enr_file1[,"FDR.q.val"]<=FDR_threshold)], rownames(enr_file2)[which(enr_file2[,"FDR.q.val"]<=FDR_threshold)])
genes_sig_enr_gs <- c()
for(i in 1:length(all_sig_enr_genesets)){
current_geneset <- unlist(genesets$genesets[which(genesets$geneset.names %in% all_sig_enr_genesets[i])])
genes_sig_enr_gs <- union(genes_sig_enr_gs, current_geneset)
}
genes_all_gs <- unique(unlist(genesets$genesets))
A <- genes_all_gs
B <- genes_enr_gs
C <- rownames(expression)
venn_diagram_path <- file.path(getwd(),"image","dark_matter_overlaps.png")
png(venn_diagram_path)
VennDiagram::draw.triple.venn( area1=length(A), area2=length(B), area3 = length(C),
n12 = length(intersect(A,B)), n13=length(intersect(A,C)),
n23 = length(intersect(B,C)),
n123 = length(intersect(A,intersect(B,C))),
category = c("all genesets","all enrichment results","expression"),
fill = c("red","green","blue"),
cat.col = c("red","green","blue")
)
dev.off()
Figure 7: Venn Diagram
# significant genes that are not annotated to any pathways in entire set of pathways used for the analysis
genes_no_annotation <- setdiff(C, A)
ranked_gene_no_annotation <- ranks[which(ranks[,1] %in% genes_no_annotation),]
knitr::kable(ranked_gene_no_annotation[1:10,], "html", caption = "Ranked gene without annotation")
| NAME | TITLE | SCORE | X | |
|---|---|---|---|---|
| 3 | MTCO3P12 | NA | 1.5206945 | NA |
| 23 | IGHJ4 | NA | 0.8735703 | NA |
| 24 | RP11-290D2.6 | NA | 0.8424625 | NA |
| 31 | IGKJ5 | NA | 0.6906748 | NA |
| 34 | EVI2A | NA | 0.6600435 | NA |
| 43 | TRAJ7 | NA | 0.5888515 | NA |
| 44 | IGHJ6 | NA | 0.5831147 | NA |
| 45 | TRDD3 | NA | 0.5829422 | NA |
| 49 | IGKJ4 | NA | 0.5590599 | NA |
| 52 | IGHJ3P | NA | 0.5412626 | NA |
# significant genes that are not annotated to any of the pathways returned in the enrichment analysis
genes_no_annotation_enr <- setdiff(C, B)
ranked_gene_no_annotation_enr <- ranks[which(ranks[,1] %in% genes_no_annotation_enr),]
knitr::kable(ranked_gene_no_annotation[1:10,], "html", caption = "Ranked gene without annotation enrichment analysis")
| NAME | TITLE | SCORE | X | |
|---|---|---|---|---|
| 3 | MTCO3P12 | NA | 1.5206945 | NA |
| 23 | IGHJ4 | NA | 0.8735703 | NA |
| 24 | RP11-290D2.6 | NA | 0.8424625 | NA |
| 31 | IGKJ5 | NA | 0.6906748 | NA |
| 34 | EVI2A | NA | 0.6600435 | NA |
| 43 | TRAJ7 | NA | 0.5888515 | NA |
| 44 | IGHJ6 | NA | 0.5831147 | NA |
| 45 | TRDD3 | NA | 0.5829422 | NA |
| 49 | IGKJ4 | NA | 0.5590599 | NA |
| 52 | IGHJ3P | NA | 0.5412626 | NA |
we will use the same ComplexHeatmap and circlize R packages as in A2. 9,10,11
First, I will show the heatmap of any significant genes that are not annotated to any of the pathways returned in the enrichment analysis.
#get genes
enrich_heatmap_matrix <- expression[rownames(expression) %in% ranked_gene_no_annotation_enr$NAME,]
#normalize
enrich_heatmap_matrix <- t(scale(t(enrich_heatmap_matrix)))
if(min(enrich_heatmap_matrix) == 0){
heatmap_col = colorRamp2(c( 0, max(enrich_heatmap_matrix)),
c( "white", "red"))
} else {
heatmap_col = colorRamp2(c(min(enrich_heatmap_matrix), 0,
max(enrich_heatmap_matrix)),
c("blue", "white", "red"))
}
enrich_darkmatter_heatmap <- Heatmap(as.matrix(enrich_heatmap_matrix),
cluster_rows = TRUE,
cluster_columns = TRUE,
show_row_dend = TRUE,
show_column_dend = TRUE,
col=heatmap_col,
show_column_names = TRUE,
show_row_names = FALSE,
show_heatmap_legend = TRUE,
)
enrich_darkmatter_heatmap
Then, I will show the heatmap of any significant genes that are not annotated to any pathways in entire set of pathways used for the analysis
#get genes
all_heatmap_matrix <- expression[rownames(expression) %in% ranked_gene_no_annotation$NAME,]
#normalize
all_heatmap_matrix <- t(scale(t(all_heatmap_matrix)))
if(min(all_heatmap_matrix) == 0){
heatmap_col = colorRamp2(c( 0, max(all_heatmap_matrix)),
c( "white", "red"))
} else {
heatmap_col = colorRamp2(c(min(all_heatmap_matrix), 0,
max(all_heatmap_matrix)),
c("blue", "white", "red"))
}
all_darkmatter_heatmap <- Heatmap(as.matrix(all_heatmap_matrix),
cluster_rows = TRUE,
cluster_columns = TRUE,
show_row_dend = TRUE,
show_column_dend = TRUE,
col=heatmap_col,
show_column_names = TRUE,
show_row_names = FALSE,
show_heatmap_legend = TRUE,
)
all_darkmatter_heatmap
Andersen, Catherine J, Kelsey E Murphy, and Maria Luz Fernandez. 2016. “Impact of Obesity and Metabolic Syndrome on Immunity.” Advances in Nutrition 7 (1): 66–75. https://doi.org/10.3945/an.115.010207.
Chen, Hanbo. 2018. VennDiagram: Generate High-Resolution Venn and Euler Plots.
Davis, Sean, and Paul Meltzer. 2007. “GEOquery: A Bridge Between the Gene Expression Omnibus (Geo) and Bioconductor.” Bioinformatics 14: 1846–7.
Efron, Brad, and R. Tibshirani. 2019. GSA: Gene Set Analysis. http://www-stat.stanford.edu/~tibs/GSA.
Gu, Zuguang, Roland Eils, and Matthias Schlesner. 2016. “Complex Heatmaps Reveal Patterns and Correlations in Multidimensional Genomic Data.” Bioinformatics.
Gu, Zuguang, Lei Gu, Roland Eils, Matthias Schlesner, and Benedikt Brors. 2014. “Circlize Implements and Enhances Circular Visualization in R.” Bioinformatics 30 (19): 2811–2.
Jassal, Bijay, Lisa Matthews, Guilherme Viteri, Chuqiao Gong, Pascual Lorente, Antonio Fabregat, Konstantinos Sidiropoulos, et al. 2019. “The Reactome Pathway Knowledgebase.” Nucleic Acids Research. https://doi.org/10.1093/nar/gkz1031.
Kanehisa, Minoru. 2019. “Toward Understanding the Origin and Evolution of Cellular Organisms.” Protein Science 28 (11): 1947–51. https://doi.org/10.1002/pro.3715.
Kanehisa, Minoru, Miho Furumichi, Yoko Sato, Mari Ishiguro-Watanabe, and Mao Tanabe. 2020. “KEGG: Integrating Viruses and Cellular Organisms.” Nucleic Acids Research 49 (D1). https://doi.org/10.1093/nar/gkaa970.
McCarthy, Davis J., Yunshun Chen, and Gordon K. Smyth. 2012. “Differential Expression Analysis of Multifactor Rna-Seq Experiments with Respect to Biological Variation.” Nucleic Acids Research 40 (10): 4288–97. https://doi.org/10.1093/nar/gks042.
Mi, Huaiyu, Anushya Muruganujan, Dustin Ebert, Xiaosong Huang, and Paul D Thomas. 2018. “PANTHER Version 14: More Genomes, a New Panther Go-Slim and Improvements in Enrichment Analysis Tools.” Nucleic Acids Research 47 (D1). https://doi.org/10.1093/nar/gky1038.
Mootha, Vamsi K, Cecilia M Lindgren, Karl-Fredrik Eriksson, Aravind Subramanian, Smita Sihag, Joseph Lehar, Pere Puigserver, et al. 2003. “PGC-1α-Responsive Genes Involved in Oxidative Phosphorylation Are Coordinately Downregulated in Human Diabetes.” Nature Genetics 34 (3): 267–73. https://doi.org/10.1038/ng1180.
Morgan, Martin. 2019. BiocManager: Access the Bioconductor Project Package Repository. https://CRAN.R-project.org/package=BiocManager.
Müller, Kirill, Hadley Wickham, David A. James, and Seth Falcon. 2020. RSQLite: ’SQLite’ Interface for R. https://CRAN.R-project.org/package=RSQLite.
Paczkowska-Abdulsalam, Magdalena, Magdalena Niemira, Agnieszka Bielska, Anna Szałkowska, Beata Anna Raczkowska, Sini Junttila, Attila Gyenesei, et al. 2020. “Evaluation of Transcriptomic Regulations Behind Metabolic Syndrome in Obese and Lean Subjects.” International Journal of Molecular Sciences 21 (4): 1455. https://doi.org/10.3390/ijms21041455.
Reimand, Jüri, Meelis Kull, Hedi Peterson, Jaanus Hansen, and Jaak Vilo. 2007. “G:Profiler—a Web-Based Toolset for Functional Profiling of Gene Lists from Large-Scale Experiments.” Nucleic Acids Research 35. https://doi.org/10.1093/nar/gkm226.
Ritchie, Matthew E, Belinda Phipson, Di Wu, Yifang Hu, Charity W Law, Wei Shi, and Gordon K Smyth. 2015. “limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies.” Nucleic Acids Research 43 (7): e47. https://doi.org/10.1093/nar/gkv007.
Rivett, A. J., and Arron R. Hearn. 2004. “Proteasome Function in Antigen Presentation: Immunoproteasome Complexes, Peptide Production, and Interactions with Viral Proteins.” Current Protein and Peptide Science 5 (3): 153–61. https://doi.org/10.2174/1389203043379774.
Robinson, Mark D, Davis J McCarthy, and Gordon K Smyth. 2010. “EdgeR: A Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data.” Bioinformatics 26 (1): 139–40. https://doi.org/10.1093/bioinformatics/btp616.
Shannon, P. 2003. “Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks.” Genome Research 13 (11): 2498–2504. https://doi.org/10.1101/gr.1239303.
Slenter, Denise N, Martina Kutmon, Kristina Hanspers, Anders Riutta, Jacob Windsor, Nuno Nunes, Jonathan Mélius, et al. 2017. “WikiPathways: A Multifaceted Pathway Database Bridging Metabolomics to Other Omics Research.” Nucleic Acids Research 46 (D1). https://doi.org/10.1093/nar/gkx1064.
Subramanian, A., P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, et al. 2005. “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles.” Proceedings of the National Academy of Sciences 102 (43): 15545–50. https://doi.org/10.1073/pnas.0506580102.
Temple Lang, Duncan. 2020. RCurl: General Network (Http/Ftp/...) Client Interface for R. https://CRAN.R-project.org/package=RCurl.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wickham, Hadley, Romain Fran?ois, Lionel Henry, and Kirill Müller. 2020. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Xie, Yihui. 2020. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.org/knitr/.
Zhu, Yuelin, Sean Davis, Robert Stephens, Paul S. Meltzer, and Yidong Chen. 2008. “GEOmetadb: Powerful Alternative Search Engine for the Gene Expression Omnibus.” Bioinformatics (Oxford, England) 24 (23): 2798–2800. https://doi.org/10.1093/bioinformatics/btn520.